Search CORE

10 research outputs found

Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

Author: Dattatreya Yesh
Fanti Giulia
Hou Charlie
Sanghavi Sujay
Shavlovsky Michael
Thekumparampil Kiran Koshy
Publication venue
Publication date: 09/11/2023
Field of study

While deep learning (DL) models are state-of-the-art in text and image domains, they have not yet consistently outperformed Gradient Boosted Decision Trees (GBDTs) on tabular Learning-To-Rank (LTR) problems. Most of the recent performance gains attained by DL models in text and image tasks have used unsupervised pretraining, which exploits orders of magnitude more unlabeled data than labeled data. To the best of our knowledge, unsupervised pretraining has not been applied to the LTR problem, which often produces vast amounts of unlabeled data. In this work, we study whether unsupervised pretraining of deep models can improve LTR performance over GBDTs and other non-pretrained models. By incorporating simple design choices--including SimCLR-Rank, an LTR-specific pretraining loss--we produce pretrained deep learning models that consistently (across datasets) outperform GBDTs (and other non-pretrained rankers) in the case where there is more unlabeled data than labeled data. This performance improvement occurs not only on average but also on outlier queries. We base our empirical conclusions off of experiments on (1) public benchmark tabular LTR datasets, and (2) a large industry-scale proprietary ranking dataset. Code is provided at https://anonymous.4open.science/r/ltr-pretrain-0DAD/README.md.Comment: ICML-MFPL 2023 Workshop Ora

arXiv.org e-Print Archive

Reputation Systems and Incentives Schemes for Quality Control in Crowdsourcing

Author: Shavlovsky Michael Borisovich
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

Crowdsourcing combines the abilities of computers and humans to solve tasks that computers find difficult. In crowdsourcing, computers process and aggregate input that is solicited from human workers; thus, the quality of workers' input is crucial to the success of crowdsourced solutions. Performing quality control at scale is a difficult problem: workers can make mistakes, and computers alone, without human input, cannot be used to verify the solutions. We develop reputation systems and incentive schemes for quality control in the context of different crowdsourcing applications. To have a concrete source of crowdsourced data, we built CrowdGrader, a web based peer grading tool that lets students submit and grade solutions for homework assignments. In CrowdGrader, each submission receives several student-assigned grades which are aggregated into the final grade using a novel algorithm based on a reputation system. We first overview our work and the results on peer grading obtained via Crowdgrader. Then, motivated by our experience, we propose hierarchical incentive schemes that are truthful and cheap. The incentives are truthful as the optimal worker behavior consists in providing accurate evaluations. The incentives are cheap as they leverage hierarchy so that they be effected with a small amount of supervised evaluations, and the strength of the incentive does not weaken with increasing hierarchy depth. We show that the proposed hierarchical schemes are robust: they provide incentives in heterogeneous environments where workers can have limited proficiencies, as long as there are enough proficient workers in the crowd. Interestingly, we also show that for these schemes to work, the only requisite is that workers know their place in the hierarchy in advance. As part of our study of user work in crowdsourcing and collaborative environments, we also study the problem of authorship attribution in revisioned content such as Wikipedia, where virtually anyone can edit an article. Information about the origin of a contribution is important for building a reputation system as it can be used for assigning reputation to editors according the quality of their contribution. Since anyone can edit an article, to attribute a new revision, a robust method has to analyze all previous revisions of the article. We describe a novel authorship attribution algorithm that can scale to very large repositories of revisioned content, as we show via experimental data over the English Wikipedia

Ezid

eScholarship - University of California

Recommended from our members

CrowdGrader: A Tool For Crowdsourcing the Evaluation of Homework Assignments

Author: de Alfaro Luca
Shavlovsky Michael
Publication venue: eScholarship, University of California
Publication date: 01/01/2013
Field of study

CrowdGrader is a system that lets students submit and collaboratively review and grade homework. We describe the techniques and ideas used in CrowdGrader, and report on the experience of using CrowdGrader in disciplines ranging from Computer Science to Economics, Writing, and Technology.In CrowdGrader, students receive an overall crowd-grade that reflects both the quality of their homework, and the quality of their work as reviewers. This creates an incentive for students to provide accurate grades and helpful reviews of other students' work. Instructors can use the crowd-grades as final grades, or fine-tune the grades according to their wishes. Our results on seven classes show that students actively participate in the grading and write reviews that are generally helpful to the submissions' authors.The results also show that grades computed by CrowdGrader are sufficiently precise to be used as the homework component of class grades. Students report that the main benefits in using CrowdGrader are the quality of the reviews they receive, and the ability to learn from reviewing their peers' work. Instructors can leverage peer learning in their classes, and easily handle homework evaluation in large classes.

eScholarship - University of California

Recommended from our members

CrowdGrader: A Tool For Crowdsourcing the Evaluation of Homework Assignments

Author: de Alfaro Luca
Shavlovsky Michael
Publication venue: eScholarship, University of California
Publication date: 01/01/2013
Field of study

eScholarship - University of California

Incentives for Truthful Peer Grading

Author: de Alfaro Luca
Polychronopoulos Vassilis
Shavlovsky Michael
Publication venue: eScholarship, University of California
Publication date: 01/10/2015
Field of study

Peer grading systems work well only if users have incentives to grade truthfully. An example of non-truthful grading, that we observed in classrooms, consists in students assigning the maximum grade to all submissions. With a naive grading scheme, such as averaging the assigned grades, all students would receive the maximum grade. In this paper, we develop three grading schemes that provide incentives for truthful peer grading. In the first scheme, the instructor grades a fraction p of the submissions, and penalizes students whose grade deviates from the instructor grade. We provide lower bounds on p to ensure truthfulness, and conclude that these schemes work only for moderate class sizes, up to a few hundred students. To overcome this limitation, we propose a hierarchical extension of this supervised scheme, and we show that it can handle classes of any size with bounded (and little) instructor work, and is therefore applicable to Massive Open Online Courses (MOOCs). Finally, we propose unsupervised incentive schemes, in which the student incentive is based on statistical properties of the grade distribution, without any grading required by the instructor. We show that the proposed unsupervised schemes provide incentives to truthful grading, at the price of being possibly unfair to individual students.Comment: 26 page

arXiv.org e-Print Archive

eScholarship - University of California

Genetic Structures and Conditions of their Expression, which Allow Receiving Native Recombinant Proteins with High Output

Author: Anna M. Kern²
Dmitry S. Polyakov PhD¹
Irina V. Morozova¹
Kirill V. Solovyov PhD¹
Michael M. Shavlovsky PhD, ScD¹
Natalya A. Grudinina PhD¹
Tatyana D. Aleynikova PhD¹
Publication venue: 'International Medical Research and Development Corporation'
Publication date: 01/03/2012
Field of study

We investigated the possibility of obtaining native recombinant amyloidogenic proteins by creating genetic constructs encoding fusion proteins of target proteins with Super Folder Green Fluorescent Protein (sfGFP). In this study, we show that the structures, containing the sfGFP gene, provide a synthesis, within a bacterial system, of fusion proteins with minimal formation of inclusion bodies. Constructs containing genes of the target proteins in the 3'-terminal region of the sfGFP gene followed by a polynucleotide sequence, which allows for affinity purification fusion proteins, are optimal. Heating bacterial cultures before the induction of the expression of recombinant genes in 42°С for 30 min (heat shock) was found to increase the output of the desired products, thus practically avoiding the formation of insoluble aggregate

Directory of Open Access Journals

Amyloidogenic peptide homologous to fragment 129–148 of human myocilin

Author: Alexander V. Slita
Alexey K. Sirotkin
Andrey V. Vasin
Aram A. Shaldzhyan
Aroca-Aguilar J-D
Dmitry V. Lebedev
Kanagavalli J
Michael M. Shavlovsky
Natalia A. Grudinina
Vladimir V. Egorov
Wentz-Hunter K
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

A conservative mutant of a proteolytic fragment produced during fibril formation enhances fibrillogenesis

Author: Alexey K Sirotkin
Andrey N Gorshkov
Andrey V Vasin
Aram A Shaldzhyan
Chipens GI
Dmitry V Lebedev
LeVine H
Michael M Shavlovsky
Natalia A Grudinina
Olga A Mirgorodskaya
Selkoe D
Solovyov K
Stephenson R
Sulkowski E
Vladimir V Egorov
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref